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Assessment in medical education usually gives the evidence that learning was 
carried out and the learning objectives were achieved. The assessment program is a 
measurement tool to evaluate the progress in knowledge, skills, behaviors, and the 
attitude of students. So, the planning for an effective assessment program should be 
based on instructional objectives, instructional activities, and efficient assessment 
methods. Thus, a well-designed assessment procedure should be characterized by 
validity and reliability. There are two methods for interpreting the results of 
students' performance, norm-referenced and criterion-referenced; the first gives a 
relative ranking of students while the second describes learning tasks that students 
can and cannot perform. The information that gets from the assessment results 
should be used effectively to evaluate and revise the instructional course for more 
improvement. Therefore, the reporting of the assessment results to stakeholders 
should be clear, comprehensive, and understandable to prevent misinterpretation 
that may affect students and other stakeholders adversely. 


© 2020 The Authors. Published by Iberoamerican Journal of Medicine. This is an open access 
article under the CC BY license (http://creativecommons. org/licenses/by/4.0/). 


1. INTRODUCTION 

Assessment is a tool for determining the extent of 
instruction intended learning outcomes achievement by 
students; it is considered an integrated process with the 
insUuction process. Moreover, a well-integrated designed 
assessment gives a credible impression about the 
effectiveness of the instruction process. In addition, the 
student assessment leads to student motivation, student 
self-evaluation development, retention and transfer of the 
learning [1]. 


Therefore, the integration of assessment with an instruction 
should be depending on essential principles for effective 
assessment. These principles should include clear intended 
learning outcomes, using the different assessment 
procedures, the relevance of procedures to instruction, an 
adequate sample of the student performance, the fairness of 
procedures, the judgment of successful performance 
according to specific criteria, the feedback to the students 
about the strength and weakness of the performance for the 
correction, the comprehensive grading, and the reporting 
system. Thus, the choice of assessment method selection 
should be depending on using the most efficient and 
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appropriate method for the intended learning outcomes 
assessment. Noteworthy, student learning improvement is 
considered the main objective of the assessment program 
[ 2 ]. 

In this context, the planning for the student assessment 
should be based on instructional objectives, instructional 
activities, and assessment methods. So, the instructional 
objectives should describe the intended learning outcomes 
in performance terms wherein this performance is evidence 
of the student learning at the end of the learning 
experience. Moreover, the revised bloom's taxonomy of 
educational objectives is considered the framework for 
identification of the previous factors via two dimensions; 
the first includes six cognitive process categories 
(remember, understand, apply, analyze, evaluate and 
create) while the second includes four knowledge 
categories (factual, conceptual, procedural and 
Metacognitive). This taxonomy prepares the assessment 
procedures and instruments in alignment with the 
instructional objectives and activities wherein harmony and 
alignment between objectives (intended learning 
outcomes), instructional activities, and assessment are the 
title of effective planning for the student assessment [3]. 
Worthwhile, the planning of assessment and instruction are 
complemented each other. So, the planning for them should 
be done at the same time to have answers for some 
necessary questions that help for the success of the 
assessment program such as what is the extent of the need 
for pretesting?. What is the type of assessment during and 
at the end of instruction? Therefore, preparation of 
achievement test should be based on a set of steps that 
include instructional objectives specification, test 

specification, construction of the relevant test items, and 
arrangement of the test items, clear direction preparation, 
revision and evaluation of the assembled test, 
administration of the test, and the test item analysis [4], 

In the related context, the assessment types may be 
classified according to timing into placement assessment 
that is a given test at the beginning of the course to identify 
the necessary prerequisite skills of the instruction success; 
it is a pretest that determines entry assessment and covers 
the intended learning outcomes of the planned instruction. 
The formative assessment (process-focused) is used for the 
learner progress monitoring during the instruction by 
identification the strength and weak points of the student 
performance; its design depends on measuring the extent of 
the learning outcomes mastering by the learners in the 
limited section of instruction wherein its results are a 
method of the learning improvement. At the end of 
instruction, the extent of the learning outcomes 
achievement and the terminal performance of students 
should be measured by summative assessment (outcome- 
focused); it is a comprehensive method for the mastering 
identification or the grades assigning, it aims to provide the 
student’s feedback and evaluation of the instruction 
effectiveness [5]. 


2. MAJOR TYPES OF ASSESSMENT 

Initially, the assessment includes testing and performance 
assessments; it is classified into tests for selected response 
and other for supply response in addition to performance 
assessments restricted or extended. 

Selected response tests measure understanding and 
thinking skills wherein the student chooses the correct or 
the best answer (Multiple-choice questions (MCQ), true- 
false and matching tests). It is a common use because of 
the administration of a large number of the selected 
response items to the students' group in a short time with 
rapid scoring of its results by the hand or machine. Its 
scoring is completely objective, but it is low in realism 
because the student selects the response from a given set of 
the possible answers and then there is a limited response to 
the listed alternatives. On the other hand, the student can 
respond by the word, short phrase, or complete essay in 
supply response tests; it requires more time for its results 
scoring, but its scoring is more subjective and then 
personal bias stands against the judgment. It is more 
realistic in comparison with the selected type because it has 
great freedom of the response with a moderate structure 
[ 6 ]. 

Restricted performance assessments assess the 
performance of highly structured limited task (writing a 
brief paragraph for a given topic); it is more realistic in 
comparison with the selected type because it has great 
freedom of the response with moderate structure as the 
supply response tests. On the other hand, extended 
performance assessments assess the comprehensive and 
less structured performance task (writing a short story); it 
is high in realism because it simulates the performance in 
the real world wherein it is an integration of ideas and 
skills of different learning sources. Noteworthy, the 
performance assessments usually are time-consuming and 
depend on the quality performance criteria. Moreover, it is 
applied by the rating scale or the set of scoring rubrics 
based on subjective judgment [7]. 


3. TYPES OF TESTS 

MCQ are the most useful selection type item; it is designed 
to measure simple and complex intended learning 
outcomes. It consists of the stem (problem situation) and 
several options (choices); the stem is a question or an 
incomplete statement while options are several answers 
(correct answer and plausible wrong answers which are 
called distracters). The best answer form is another type of 
multiple-choice item for more complex achievement 
wherein all options are partially correct but one option is 
clearly better than the others [8]. 

To prepare the effective multiple-choice item, it should be 
the design of the item for one intended learning outcome 
measurement. Furthermore, the stem of the item should be 
a single clear problem formulation with simple and clear 
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language along with much wording in the stem of the item 
with avoidance of the repeated same material in the 
options. Moreover, the item stem should be in a positive 
form emphasizing the negative wording using underline or 
capitalization or its near position for the statement end. The 
intended answer may be correct or clearly best wherein all 
options are consistent with the item stem grammatically 
and parallel in the form avoiding the verbal clues to 
prevent discrimination of the correct or incorrect answer 
such as similarity of the wording in the stem and correct 
answer, stereotyped phraseology of the correct answer, 
great detail of the correct answer, absolute terms in the 
distracters “always, never, all, none” or there are two 
inclusive responses or two responses have the same 
meaning. Moreover, the correct answer length should vary 
as well as the position of the correct answer should vary 
randomly, besides avoidance using the phrase “all of the 
above” as an alternative, but the phrase “none of the 
above” should be used with extreme caution. In addition, 
the difficulty of the item is controlled by the complexity of 
the stem problem or by the homogeneity of alternatives. 
Each item should be independent for other items in the test 
along with the application of normal rules of grammar and 
using the efficient item format [9]. 

In addition, distracters should be plausible and attractive to 
the uninformed; it should be stated in the student language 
with good sound words and similar to the correct answer in 
the length and complexity of wording. Distracters should 
represent common misconceptions or errors of students; it 
should be homogenous and has extraneous clues without 
overusing. Noteworthy, breaking any one of the above 
rules may be encouraged if it will improve the item 
effectiveness according to experiences of the test maker in 
the item writing [10]. 

Matching items type is a simple variation of multiple- 
choice items wherein it should shift to matching items 
when there are a number of related similar factors. 
Matching items type is a series of stems (premises) and 
series of answers (responses) which are arranged in the 
columns under the guiding directions for the matching. The 
matching items type should include matching item material 
homogeneity and a shortlist of items with brief responses 
on the right. Moreover, the number of responses should be 
larger or smaller than premises with responses using more 
than once and placed in alphabetical or numerical order. 
Directions should be specific and a basis for matching 
wherein it should indicate that the use of response may be 
once or more than once, or not at all. Worthwhile, the 
matching items should be placed on the same page with the 
responses [11]. 

The extended matching question (EMQ) is different from 
the single best answer multiple-choice questions and 
superior to it for the assessment of the problem-solving and 
clinical reasoning skills of the students. It consists of a 
theme (symptom, diagnosis, treatment), options list 
(answers), lead-in statement (question), and two stems (two 
clinical problems) [12], 


4. HOW TO ASSESS THE PSYCHOMOTOR 
DOMAIN IN MEDICAL EDUCATION? 

Objective Structured Clinical Examination (OSCE) is 
considered the used tool for assessment of the psychomotor 
domain; it is an examination for competence assessment 
(content skills, process skills, and clinical management). It 
is considered the standardized tool for clinical 
competencies assessment such as history taking, physical 
examination, and technical procedures. According to the 
Millers pyramid, OSCE measures the category which is 
called shows how; it consists of multiple stations and a 
wide sampling of clinical and communication skills with a 
lot of examiners and patients within a limited time by using 
a checklist or global rating scale. Therefore, it has high 
reliability because the use of detailed checklists may 
decrease inter-rater unreliability and then reinforces the 
reliability. In addition, the test results depend on the direct 
observation and the repeated measurements that help the 
examiner to assess many different qualitative aspects such 
as efficiency and the students’ skill performance. 
Moreover, there is also acceptability for this exam because 
every student does the same task. It is also a valid exam 
depending on content (good sampling of matching skills 
with the learning outcomes), construct validity, and 
authentic length of the station [13]. 

To design a good OSCE, it should determine the examined 
skills types in alignment with the learning objectives of the 
module and the types of assessment tools (ex. checklist). 
Moreover, it also should determine the number of stations 
(10-15 stations), the time of station and the length of 
examination (TO minutes X 10 stations = 100 minutes) 
besides the preparation of resources such as examination 
rooms, manikins, examiners, patients, and volunteers [14]. 
Furthermore, the marks scheme should be constructed 
depending on discrimination actions to distinguish between 
good and poor performance. In addition, the preparation of 
instructions is also considered essential for the examiner, 
patient, and student. At first, it should outline the required 
task exactly at every station for the student along with 
outlining the marking scheme instructions about the action 
and performance of the student at every station for the 
examiner. Secondly, it should outline the dealing approach 
between the patient and the student. Finally, it should 
evaluate the exam after finishing it. Noteworthy, the 
success OSCE depends on the availability of facilities such 
as manikins and other tools, examiners, real patients, 
actors, technical and administrative teams, and training 

[15]. 

At last but not least, the use of short stations in the OSCE 
is considered a controversial issue wherein some educators 
think that it is destructive to the validity of the test. Some 
educators adopt this view because the use of short stations 
does not allow to assess other aspects of shows how level 
such as the ability of students to deal with complicated 
situations that need the integrated different skills such as 
decision making, drawing the conclusions based on 
physical examination and investigation and management 
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skills of the case. Thus, the use of short stations becomes 
limited to the technical skills only according to some 
views. On another hand, other educators prefer the use of 
long stations as an alternative indicating the limited 
influence of the station length on the reliability. Therefore, 
I think that the best is the determination of the assessment 
task by using a good balance for the content apart from the 
controversial views to ensure the authenticity and the 
efficiency of measurement [16]. 


5. HOW TO ASSESS THE AFFECTIVE DOMAIN 
IN MEDICAL EDUCATION? 

Worthwhile, the performance tasks usually contain 
knowledge, skill, and the affective components (affective 
domain) that describes the learning objectives which 
address feeling, emotion, and the degree of acceptance or 
rejection. Moreover, the affective domain has many 
parameters such as attitude that is an important mental 
parameter of the affective domain; it consists of cognition, 
affects, behavioral intentions and evaluation. The second 
parameter of the affective domain is the motivation that 
means initiation, direction, and human behavior 
persistence; it includes also engaging reasons in a special 
behavior such as basic needs, object, goal, and the 
desirable ideal. Thirdly, another parameter is the self- 
efficacy that is considered a personal perception for the 
ability of performance in a particular manner [17]. 

Thus, the affective domain is difficult to assess because it 
emphasizes attitude, feeling, emotion, and values. So, it 
should be stated in specific, measurable, observable 
objectives to translate into quantitative terms. Therefore, 
the taxonomy of affective domain classifies the behavioral 
objectives into observable behaviors in the quantitative 
terms such as receiving (accept, attend, recognize), 
responding (discuss, complete, examine), valuing (accept, 
seek, defend), organization (discriminate, organize, 
systematize), and characterization (verify, internalize) [18]. 
In this context, the assessment of affective domain depends 
on many tools that assess attitudes, interests, motivations, 
and self-efficacy. These tools include self-report, rating 
scales, semantic differential scales, Thurstone scale, and 
checklist. The self-report is written reflections that are 
done by an individual about his attitude or feeling toward 
an idea or people or concept while the rating scales are a 
number of the designed categories to extract the 
quantitative information such as Likert scale and 1-10 
rating scale. Semantic differential scales "SD" assess the 
personal reaction to specific ideas or concepts in rating 
terms on bipolar scales while the Thurstone scale assesses 
the attitude by determination favorability position on the 
issue [19]. 


6. HOW TO ASSESS THE COMPREHENSIVE 
DOMAIN “COMBINED DOMAINS” IN MEDICAL 
EDUCATION? 

Portfolio-based assessment is a live alternative to 
traditional high stakes testing. So, it is used for summative 
and formative assessment wherein it has value as a source 
of self-satisfaction. The portfolio is considered one of the 
useful and popular assessment tools of the student 
performance in undergraduate and postgraduate medical 
education; it aims to link the objectives of instructional 
course with clinical experience that is recorded in a 
standardized manner to facilitate the learning, teaching, 
and assessment [20]. 

The portfolio is a collection of systematic, selected, 
purposeful and organized student work (materials) that 
show the personal ability of every student (evidence of 
performance) and his professional development via 
measuring the growth of knowledge, skills, and attitudes. 
Therefore, the content of the portfolio (evidence of the 
learning achievement) consists of clinical tutor reports, 
selected student assignments, a list of attained skills, and 
evidence of communication skills, assessment results, and 
the reflective diary [21]. 

In this context, we can divide the portfolio into two types; 
developmental and showcase portfolio. The developmental 
type is usually used throughout the instructional course 
(formative) and assesses the student learning progress 
while the showcase type is used at the end of the course 
(summative) and shows the student's best work samples 
and the final level of performance [22]. 

In addition, portfolios have many advantages such as the 
learning progress assessment over the times, positive effect 
for the coverage of the best student work, and providing 
the greater motivation because of comparison between the 
present and past work. Furthermore, its advantages include 
an improvement in the self-assessment skills of the student, 
providing reflective learning, adjustment of the individual 
differences, providing the connection between theory and 
practice besides communication with the students and 
parents for the learning progress, and an increase in 
collaboration between student and teacher. However and 
for fair judgment, we should remind that portfolios have 
some disadvantages such as the time consuming because of 
the portfolio entries selection, periodic revision, and 
providing the feedback [23]. 

To plan the portfolios, there are many steps that should be 
applied such as determination of the portfolio purpose and 
the involved entries types with a determination of the 
guidelines for entries selection and evaluation. In addition, 
it should also determine the procedures of portfolio 
maintenance and using, and the criteria of portfolio 
evaluation. Finally, we should discriminate between 
portfolio evaluation as a structure and the student 
evaluation as performance progress. The structural 
evaluation of the portfolio depends on makeup, 
organization, and content while overall evaluation of the 
student performance progress that is shown in the portfolio 
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is determined via the rating scale based on the learning 
outcomes assessment. Thus, the holistic rubrics of each 
involved area in the portfolio determine the final level of 
student performance [24]. 


7. HOW TO DEAL WITH THE ASSESSMENT 
RESULTS? 

Firstly, the assessment results should be summarized 
concisely into informative data such as tallies, percentages, 
and qualitative data (themes, grouped listings). Secondly, 
the assessment results should be sharing as a 
summarization for these results or in a brief report 
associated with essential information such as identification 
of the successful student rules, satisfactory evidence for his 
success, and the determined action for unsatisfactory 
results. Moreover, the venues of the assessment results 
sharing should be determined via choosing one venue or 
more such as web sites, emails, newsletters, presentations, 
brochures, posters, or banners [25]. 

In this context, the reporting of assessment results should 
be fair, honest, balanced, objectively, useful, and 
documented with providing appropriate attribution. So, it 
should give the most impact via using the meaningful, 
attractive, interesting title and headings. Furthermore, the 
reporting of assessment results should be short, cascade 
from major points to details with informed commentary. In 
the related context, grading of results is also considered an 
essential element because it provides us effective feedback 
about the learning process and the suggestions for its 
improvement wherein assigning grades are a valid measure 
for learner achievement [26]. 

Noteworthy, the performance assessment has different 
types such as essay tests, ratings, and multiple-choice 
questions wherein it translates the student performance to 
grades that represent the extent or degree of intended 
learning outcomes achievement. Therefore, every medical 
school should be having a clear grading policy for valid 
judgment. Moreover, grading may be divided into two 
types; the first is an absolute grading while the second is 
relative grading. Absolute grading is based on a 
comparison between the student performance and pre¬ 
specified standard of performance depending on the 
mastering of the learning and cutoff points identification 
while the relative grading depends on a comparison 
between the student performance and the group members' 
performance for individual ranking in the group [27]. 

In addition, the validity of the grading system should be 
based on the efficacy and fairness of the assigning grades. 
Therefore, there are some guidelines that should be applied 
during the designing of the grading system. Initially, the 
students should be aware of the grading system of the 
course achievement at the beginning of the course 
including components of assessment, the weight of every 
test grade, and the description of every letter grade. 
Worthwhile, these guidelines should be written in detail in 
the study guide of every module. Secondly, grades should 


be based on student achievement only without addition to 
extraneous factors such as effort or misbehavior. Thirdly, 
grades should also be based on varieties of valid 
assessment data and all learning outcomes while the results 
should be involved in the final grade for more validity of 
the grade. Fourthly, the weighting method should be used 
for combining scores of the grading with a selection of a 
suitable frame for the grading reference. Finally, the 
revision of the borderline cases should be done by re¬ 
examining all achievement evidence [28]. 

Flowever, the results or test scores interpretation is an 
important step in dealing with the assessment results 
wherein it is considered a translation of the quantitative 
data to equal numerical set; it is a process for score 
analysis to generate meaningful quality. Noteworthy, there 
are different types of scores; the first is the raw score that is 
a number of the received points in the test that have not 
meaningful interpretation while the second is the scaled 
score that is a result transformation through a consistent 
scale. In addition, the test score interpretation should 
depend on the referencing framework that is a structure for 
comparison of the student performance to something 
external to the assessment itself; it is a comparison of the 
student score to the predetermined standard of performance 
(standard criteria) [29]. 

Thus, the referencing framework for the test score 
interpretation may be a criterion-referenced framework or 
norm-referenced framework. The criterion-referenced 
framework is the description of individual performance in 
the test without referring to the performance of others 
wherein the criterion is the domain of performance that is a 
reference of the student assessment results. Worthwhile, 
this interpretation is meaningful if the test is designed 
specifically for this purpose. So, the test performance using 
criterion-referenced assessment can be measured by the 
speed of performance (task performance within a fixed 
time), the degree of performance accuracy, the percentage 
(proportions number of maximum points gained) such as 
the percentage of the corrected answers or the percentage 
of the learning objectives achievement, the quality rankings 
(quality level of performance such as an excellent rating of 
4, good rating of 3), the percentage of the correct score 
(standard for judgment of the performance mastering of the 
learning objectives), and the expectancy table (it interprets 
raw score in expected performance terms) [30]. 

The norm-referenced framework is a comparison of the 
individual test score with other students' test scores who 
take the same test. Therefore, it determines the student 
standing in the reference group wherein the student score is 
not treated individually but it is related to the group. 
Moreover, norm-referenced scores depend on the 
transformation of the raw score mathematically wherein 
the raw score in the norm-referenced framework is not 
valid for the student performance interpretation. So, it 
should be converted into the derived score that is a 
numerical report of the test performance on the score scale. 
The percentage of the norm group that is scored below a 
particular raw score is identified as percentile ranks; it is 
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different from the percentage of corrected answers items 
that is criterion-referenced interpretation. Developmental 
scores or scales are one of the norm-referenced scores that 
identify the development of students across various grades 
or age levels wherein the grade equivalent score is 
matching the particular raw score that equals the obtaining 
grade level of the student. The standardized scores of 
norm-referenced scores are transforming scores for the test 
performance comparison across two or more different 
measures; it divides into linear standard scores and 
normalized standard scores wherein the linear standardized 
scores (Z-scores and T-scores) compare between two 
distributions of the test performance and maintain the same 
distribution shape of corresponding raw scores while the 
normalized standard scores (stanines and deviation IQ 
scores) depend on the knowledge of normal distribution 
characters in the interpretation and convert the distribution 
of the raw scores to normal distribution. Finally, I want to 
remind that all norm-referenced scores contain errors 
because there is not test act as a perfect measure [31]. 
Finally and conclusively, there is not a gold standard¬ 
setting in the assessment. According to the above 
mentioned, there are two types of standard-setting 
methods; criterion-referenced or absolute method, wherein 
the standard-setting does not depend on the test results 
(independent) while norm-referenced or relative method 
wherein the standard setting is based on the test results. 
The norm-referenced standard is considered the method of 
choice to rank examinees while the criterion-referenced 
standard is considered the most appropriate to fulfill 
whether examinees’ mastering of a specific domain meets 
the pre-set requirements. Regrettably, two standard-setting 
approaches have disadvantages that diminish their 
credibility because it leads to widely divergent results on 
the same test. The criterion-referenced method with a pre¬ 
fixed cut-off score leads to a large variation in failure rates 
while the norm-referenced method leads to a large 
variation in cut-off scores. In addition, the procedures of a 
criterion-referenced standard setting require panels to 
determine a minimum acceptable level per test item. 
Moreover, these procedures are considered time- 
consuming and costly. So, the cut-off scores are established 
in the form of a pre-fixed percentage of the corrected 
answers of test questions because of the inability to use 
regularly the panels for standard-setting procedures. 
However, merging a pre-fixed cut-off score with a relative 
point of reference as a compromise method may reduce the 
disadvantages of conventional criterion and norm- 
referenced methods besides making the optimal use of their 
advantages [32], 

So, every educational institution should have a vision for 
the interpretation of the assessment results; this vision 
should determine benchmarks or standards wherein the 
interpretation of assessment results should be based on it. 
Benchmark or standard may be local, external, internal, 
value-added, historical trends, strengths and weaknesses 
perspective, and capability or productivity. According to 
the benchmark or standard choice, we can compare our 


students with their peers inside or outside the institution at 
a national or international level and determine what the 
extent of the improvement achievement for the students or 
the educational program, the strength and weakness points, 
capability and productivity of the students. However, 
some schools adopt standardized achievement tests that 
depend on the norm-referenced approach to interpret their 
results. It compares the student performance to the 
representative sample of students’ performance in the norm 
group at a regional or national level; it is designed to 
determine the common set of goals achievement by the 
students. So, there are some guidelines that should be 
applied when standardized achievement tests are 
constructed. At first, the test content should be depending 
on many the used textbooks besides the test items should 
be constructed by test experts and subject matters. 
Moreover, the test items should also be selected depending 
on the test specifications, and then it is revised and 
analyzed for the difficulty via using the rigid directions for 
the test. In addition, the test scores should be interpreting 
according to the norm-referenced framework whereas the 
test manual should be included the procedures of scoring, 
interpretation, and the use of results. Finally, we can 
modify the standardized achievement test and interpret its 
scores according to the criterion-referenced framework if 
we can modify multiple-choice items and add open-ended 
performance task [33]. 

Noteworthy, the percentage of the correct score is 
considered one of the best methods of reporting of the 
criterion-referenced test results wherein it tells us about the 
percentage of corrected answers in the test. However, the 
norm-referenced scores have different types that are used 
with standardized tests such as percentile ranks, grade 
equivalent scores, and the standard scores. The percentile 
rank is different from the percentage of the corrected 
answers (criterion-referenced) because it indicates the 
relative position in the group as a percentage of students 
scoring while the grade equivalent scores indicate the 
relative test performance as a grade level. The standard 
scores depend on statistics such as mean and standard 
deviation of the scores set [34]. 

On the other hand, the assessment feedback is important 
for the stakeholders such as students, parents, and the 
educational authority wherein its importance for the 
students and parents is determining the level of 
achievement and the position of students among their 
peers. In addition, it is also important for the governmental 
educational administrators to evaluate the instruction and 
the learning process, the extent of learning outcomes 
achievement, and the success of the educational policy of 
this medical school. Thus, we should use a detailed 
reporting system about the performance of the learning 
outcomes of the course [35]. 

In the end. the report of results should be comprehensive, 
well organized in an arranged manner without lengthening 
and confusion issues, rating the performance, and 
informative based on the list of specific learning outcomes. 
However, the report format choice depends on the report 
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material and audience. So, we can use a full report as a 
complete assessment activities record or assessment 
summary as a note, brochure, or flyer to highlight the 
particular findings or specific issues. Thus, the components 
of the assessment report should include a description of 
activities, results interpretation, and suggestions. 
Moreover, the determination of audience or stakeholders 
should be known before the determination of content, 
format and the method of assessment results reporting 
because every stakeholder needs different content and style 
of the results report according to his scope such as 
accrediting organization, higher education commission, 
medical education committee, students, and the parents. 
Furthermore, the assessment results may be used as a 
method for curriculum evaluation and revision or 
accreditation or employment. Therefore, web reporting is 
considered one easy access wherein it is used for a wide 
range of audiences [36]. 

At last, we would like to mention that communication of 
the assessment results should be clear, understandable, 
interesting, explainable, and appropriate for the content. 
Thus, it may be a chart, table, or graph according to the 
available data. Effective tables and charts should have a 
meaningful and self-explanatory title and content with a 
clear label for every table or chart. Moreover, the results 
should be classified into groups if it is much, and it should 
be easy for the readers to detect the differences and trends. 
At the end of this paragraph, we should refer that the 
confidentiality of the assessment result reporting is a title 
of the participant’s credibility in the assessment process 
[37]. 


8. HOW TO DESIGN A SUCCESSFUL 
ASSESSMENT PROGRAM? 

Continuing with what we started, we can summarize the 
ingredients for designing a successful assessment program 
for the medical student. At first, the rules and procedures of 
assessment should be clear to the students at the beginning 
of the module; it should also be involved in the study guide 
of the module. Secondly, using a well-designed assessment 
procedure that is characterized by validity and reliability; 
the validity means appropriate and meaningfulness of 
inferences that extracted from the assessment results for the 
intended use, it should include the content that means the 
representative of the learning objectives in the assessment 
and congruence of the assessment instrument with the 
purpose (construct validity). Moreover, it should also 
include the predictive validity that means the ability of the 
instrument to predict performance in the future besides the 
reliability of an assessment that is the consistency of the 
assessment results which can be interpreted by norm- 
referenced or criterion-referenced, it is a necessary 
prerequisite of the valid test. Noteworthy, a highly reliable 
test doesn’t mean necessary its validity. In addition, we can 
divide reliability into many types; the inter-rater reliability 
means consistency of the performance rating by different 


examiners (raters) while the inter-case reliability is a 
measurement of the student performance from one case to 
another with consistent variables. Furthermore, the test- 
retest reliability is measured by the correlation of one score 
with others; it is an indicator of consistency over time. 
Worthwhile, increasing the testing time and the number of 
questions are considered methods for improvement of 
examination reliability. In the related context, the 
acceptability of the instrument for the users determines its 
usefulness to measure what it is supposed to measure (face 
validity) besides the utility of assessment instrument that 
should be depending on the reliability, validity, educational 
impact, costs, and the acceptability of method [38]. 

Thirdly, the choosing of an assessment instrument for any 
examination should be depending on multiple levels of 
clinical competence that are suggested by Miller (Millers 
Pyramid). MCQ, Essay, and Oral exam are suitable 
instruments to test knowledge (knows) while clinical 
scenarios based MCQ, Oral exam, and the Extended 
matching items are suitable assessment instruments to test 
understanding and concept building (knows how). 
Moreover, the OSCE and the standardized patient are 
suitable to test the performance (shows how) while the 
performance log (logbook), checklist, and portfolio are 
suitable to test the concerned task performance in a real- 
life situation (does). Thus, it should choose one or two 
assessment instruments from each level to reflect the real 
ability of examinee [39]. 

Fourthly, it should use the blueprinting for the tested 
objectives specification and determination of its relative 
weight in the examination wherein the table of 
specification is the blueprint of the test; it identifies the 
types of test items that should be included in the test 
according to the time spent and the cognitive level of every 
objective. So, it should align the summative test with the 
studied subject matter and the used cognitive process 
during the instruction. Worthwhile, the table of 
specification improves the validity of the test that is based 
on the quality of the evidence (test content and response 
process); the test content is the studied subject matter while 
the response process is the kind of thinking that is required 
in the test. In addition, there are many approaches to 
develop and use the table of specification; one approach of 
them depends on a selection of the tested learning 
outcomes wherein we can select and put the learning 
objectives according to the terms of Bloom’s taxonomy in 
the cognitive domain [40]. 

Fifthly, a referencing framework should be applied to get 
accurate and useful results interpretations. Norm- 
referenced interpretation is a survey testing to measure the 
individual differences in the achievement wherein it 
depends on the other student’s performance for 
determination the passing and fail grade of the given 
student. On another hand, the criterion-referenced 
interpretation is a mastery testing to describe the tasks that 
the student can perform with comparison his performance 
to a specific achievement domain wherein it depends on 
the certain determining level of knowledge or skill for 
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passing the exam. Noteworthy, the criterion-referenced 
framework does not depend on other performances of 
examinees but it is based on the particular examinee 
performance [41], 

In addition, the standard sitting may be used that is a 
special boundary one score to determine who performs 
well and who does not wherein the credibility of the 
standard is different according to who sets the standard, 
characters of the used methods, and the outcome. In the 
end, the assessment should have feasibility that depends on 
the availability of resources such as availability of the time 
for test development, test administration, analysis of 
papers, availability of training for examiners and the costs 
[42], 


9. CONCLUSIONS 

Assessment in medical education is a tool to evaluate the 
learning process through the student assessment. The 
assessment program evaluates the medical student in 
different domains such as cognitive, psychomotor, and 
affective via using tests for the selected response and other 
for the supply response in addition to the performance 
assessments restricted or extended. So, the planning for a 
well-designed assessment program should be based on 
effective ingredients for the success wherein it should be 
characterized by validity and reliability. Moreover, 
interpretation and reporting of the assessment results to 
stakeholders should be clear, comprehensive, and 
understandable to enable different stakeholders to evaluate 
and revise the instructional course effectively for more 
improvement. 
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